Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds pygments library for html formatting of the plain text result preview #21

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

dubrousky
Copy link

Pygments is a python source code highlighter http://pygments.org/docs/ . The idea is to use highlighting tools when previewing the plain text search results. The changes added perform the guess on the source code language based on the previewed contents and if match found - highlight the preview as html page with formattng and line numbers. If there is no pygments installed in the system the code falls back to the plain text.

@koniu
Copy link
Owner

koniu commented Dec 29, 2013

Hello. I have to say: a cracking idea! I briefly tested the patch (had to change an else if to elif in one place for it to work) and source code files, manpages, etc do look awesome.

There is a slight problem however: guess_lexer() seems pretty liberal when it comes to deciding how to handle stuff. It results in spurious formatting of emails and other plain text. To give an example: words like 'for' or 'else' get highlighted and font colours get confused when apostrophes get taken for quotation marks. Addition of line numbers for emails and such is pretty questionable too.

I had a quick look at pygments' docs and I'd propose using get_lexer_for_filename() instead of guess_lexer() to solve this problem but perhaps there is a more robust way around. Any ideas?

@dubrousky
Copy link
Author

Hi, as for the way to recognize the source file language - if we knew the full path to the file, we might use get_lexer_for_filename() as you mentioned - I did not look much into the recoll API. Also it is possible to configure pygments output (such as line numbers) depending on the mime type, source language, or user preferences. I just did the quick fix to get what I was missing in this tool. I would also add some options to the webui-standalone.py to set the hostname and port from the command line. I would also suggest to provide option to limit the bottle web server to serve requests only from localhost - for security reasons.

@ghost
Copy link

ghost commented Dec 30, 2013

You can get the full path from doc.url (the original doc, not the tdoc which holds the extracted text). You could also conceivably make use of doc.mimetype. This can have 2 origins:

  • For files with extensions in /usr/share/recoll/examples/mimemap, the mime type comes from there.
  • For other files, the mime type comes from "file -i"

So there is a slight risk of unstability and system/version dependances with the mime type, and maybe you're better off using the extension if there is one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants